Automatic peak identification method

ABSTRACT

The invention described herein details a protocol to improve analysis and peak identification in spectroscopic data. Bayesian methods are used to automatically identify peaks in data sets. After identifying peak shapes, the method tests the hypothesis that a given number of peaks is found within any given data window. If a peak is identified within a given window, then the likelihood function is maximized in order to estimate peak position and amplitude. This process yields a spectrum with high resolution and minimal artifacts. The method described herein is particularly useful for identifying peaks in data sets obtained from spectroscopy.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority from U.S. Provisional Application Ser. No. 60/664,010, filed Mar. 22, 2005.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under Grant No. 5R44 CA101479-03 awarded by NIH. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION

The invention is related to the field of analyzing data which includes multiple peak data and, in particular, pertains to an automatic method for analyzing data which includes multiple peak data resulting in high resolution estimates.

Work in chromatography, spectroscopy, engineering, pharmacology, and other fields frequently requires analysis of data sets exhibiting multiple peaks. Analysis of multi-peak data is particularly difficult when peaks overlap, or when data are noisy. Peak finding algorithms of many types are used to identify and interpret peaks and peak positions.

Present state-of-the-art peak finders are either based upon 1] simple matched filters, or wavelets, or searches for local maxima in the time series (resulting in less-than-optimal resolution), 2] parameter fitting to peaks using optimization methods, sometimes on the entire time series (such methods are slow), or 3] direct methods which involve significant operator interaction (i.e. by placing cursors over peaks on a computer screen, estimating baselines by eye, etc.). All of these methods have disadvantages. The method described herein runs essentially like a filter, results in much higher resolution estimates of peak positions and amplitudes than the existing state-of-the-art, and is almost completely automatic.

SUMMARY OF THE INVENTION

The invention described herein details a protocol to improve analysis and peak identification in spectroscopic data. Bayesian methods are used to automatically identify peaks in data sets. After identifying peak shapes, the method tests the hypothesis that a given number of peaks is found within any given data window. If a peak is identified within a given window, then the likelihood function is maximized in order to estimate peak position and amplitude.

This process yields a spectrum with high resolution and minimal artifacts. Furthermore, the process is not overly computational intensive, with low delay times required. A method that simply maximizes the likelihood function is likely to generate artifacts, while a method that only tests the hypothesis that a peak is present, without the subsequent maximization of the likelihood function, will yield a spectrum with decreased resolution.

The peak identification method described herein is potentially useful in any data set wherein peaks of interest are distinguished. The method described herein is useful for identifying peaks in data sets obtained from spectroscopy, including but not limited to various forms of absorption and emission spectroscopy, including absorption spectroscopy, UV/VIS spectroscopy, IR spectroscopy, X-ray spectroscopy, X-ray crystallography, Nuclear Magnetic Resonance spectroscopy (NMR), and raman spectroscopy. In particular, the methods described herein are useful for applications in mass spectroscopy, including TOF-SIMS and SELDI-MS. For example, the methods described herein are useful for analyzing TOF-SIMS mass spectroscopic data related to proteomics. The methods described herein are not limited to spectroscopic applications; for example, the methods are particularly useful when used for imaging applications.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a sample spectrum, in which the number of ions is plotted versus the time required for the ions to reach the detector.

FIG. 2 shows four panels from a TOF-SIMS spectrum, with each successive panel corresponding to successive manipulations of the data.

FIG. 3 depicts a small portion of a TOF/SIMS spectrum of angiotensin.

DETAILED DESCRIPTION OF THE INVENTION

The following symbols are accorded the following meanings:

N total number of data points in the window

s_(j)=s(t_(j)) j=1 . . . N observed signal prior to any processing

x_(j)=x(t_(j)) ideal wavelet shape

η_(j)=η(t_(j)) noise

σ_(η) ²=

ηη^(T)

noise co-variance

Herein we describe the use of Bayesian methods to automatically identify peaks in data sets. We start with a brief overview of the logic. The first step is to compare the hypotheses that there are a given number of peaks (i.e., 0, 1, 2, . . . n) in the observation window. If a peak is present, we then estimate its position and amplitude via parameter fitting.

After this summary of the logic, we develop algebras for two simple cases where Gaussian or Poisson noise is involved. Both of these cases are simple enough that all results can be arrived at in closed form. We first present how to derive a matched filter for a known peak shape that is embedded in additive white (Gaussian) noise of unknown amplitude. We then show the similar procedure can be applied to Poisson noised data. Some preliminary results on SELDI-MS (Surface Enhanced Laser Desorption Ionization Mass Spectrometer) and TOF-SIMS (Time of Flight Secondary Ionization Mass Spectrometer) data are shown. It is easy to add color (finite time correlations) to the noise. Once these simple situations are understood, extending to other types of noise, such combinations of Poisson, Gaussian, and coherent ‘noise’ processes, is logically straightforward, though algebraically more involved. All of the methods described herein require knowledge of the ‘target’ peak shape.

FIG. 1 shows a sample mass spectrum, in which the number of ions of was plotted versus the time that ions take to fly to detector. Ions of a specific m/z, ideally, would hit the detector after same time's flight, resulting in sharp peak line shape like a delta function. However, in reality, they enter the drift region with velocity and time distributions of finite width, which results in a finite peak width for ions of a specific m/z.

1.1 Description of the Underlying Assumptions

Here we introduce a few assumptions that underlie the detection of peaks in the spectra.

1.1.1 The Relationship Between Peak Signal and Probability Density

In counting experiments, ideally only one ion of a given mass arrives, or at most a few ions of the same mass arrive, at the detector during a given temporal sampling interval and are counted individually. The measurement is repeated many times under nominally identical conditions and one gradually builds up a portrait of the probability distribution of arrival times for each ion mass. This means that the final time series s(t) that is to be analyzed after all the data has been gathered, is proportional to the conditional probability distribution s(t _(k))Δt∝p(t _(k) |f ₀(v _(*)(m),t _(*)))Δt,  (1) wherein s(t)Δt is the signal amplitude at time t_(k), Δt is the sampling interval, and p(t_(k)|f₀(v_(*)(m),t_(*))) is the probability that an ion of mass m will strike the detector between t_(k) and t_(k)+Δt given that it entered the drift region at time t_(*) with a velocity close to v_(*)(m). If only one ion species is present, the initial velocity distribution, f₀(v_(*)(m),t_(*)) is just a sharply peaked function of v of the form $\begin{matrix} {{{f_{0}\left( {{v_{*}(m)},t_{*}} \right)} = {g\left( \frac{v - {v_{*}(m)}}{\sigma(m)} \right)}},{{\int{\mathbb{d}{{sg}(s)}}} = 1.}} & (2) \end{matrix}$ If a variety of different ions is present, f₀ will be a superposition of terms like (2), properly normalized so that f₀ will have unit total area. This assumes that the shape of the velocity spread is universal, up to translation and rescaling. We will assume v_(*)(m) and σ(m) to be weak functions of the mass, and to scale like 1/√{square root over (m)}. Fixing their functional form is part of the calibration process. In practice, the velocity distributions for each peak will contain information about the dynamics of the ion formation process, which depend upon the ion species. However, if we assume that the ion optics are designed so that it will only pass a narrow range of most possible velocities for any given mass, then maximizing the entropy of the probability distribution f₀ for a single peak, given v_(*) and σ, leads to the choice of a Gaussian for g: $\begin{matrix} {{g\left( \frac{v - v_{*}}{\sigma} \right)} = {\frac{1}{\sqrt{2\pi}\sigma}{\exp\left( {- \frac{\left( {v - v_{*}} \right)^{2}}{2\sigma^{2}}} \right)}}} & (3) \end{matrix}$ The choice of the maximum-entropy distribution is founded upon the principle that it maximizes the number of possible outcomes of repeated observations that are consistent with this assignment of the probability. Hence, it is the least biasing assignment of the probability that is consistent with our limited knowledge of the initial distribution. Transforming from the initial velocity distribution g to the temporal peak shape requires solution of a simple Fokker-Planck equation, and is described later.

1.1.2 The Assumption of Independence

The observed peaks have finite width in time, as showed in FIG. 1. One of the causes is the finite width of the velocity distribution upon entering free flight. However, it is important to realize that ions that arrive between time t_(k) and t_(k)+Δt are independent of those that arrive at any other time. This is because, as stated in previous section, each event/measurement only results in one or at most a few ions of a given m/z. The resulting spectrum is an accumulation of many times of measurements under the same experiment condition. Therefore, the joint probability of arrival of two ions at two different times satisfies p(t _(k) ,t _(j) |f ₀)=p(t _(k) |f ₀)p(t _(j) |f ₀), j≠k.  (4) even for counts associated with the same mass peak. Any correlations in the signal are assumed to be due to the electronics. If such correlations are present (as they are in MALDI and SELDI spectra) then this must be taken into the analysis as part of the model used, as will be discussed later. The assumption of independence, however, underlies the entire analysis that we pursue below in the first few examples.

Because the measurement is repeated many times under ‘identical’ conditions, producing the same f₀ at each repetition (with the ‘clock time’ t reset to zero at the start of each measurement), if a total number of {overscore (n)} ions are counted over the entire course of the measurement, then the total number of counts between t_(k) and t_(k)+Δt is very nearly n _(k) ≈p(t _(k) |f ₀){overscore (n)}  (5)

1.2 Model Comparisons (i.e. Counting Peaks in the Window, Comparing Noise Models)

We now consider the peak of a particular of mass m. We introduce a window that includes only N points in the time series. The width of the window is chosen to be a slowly varying function of the mass so that it typically will run only from the half-maxima to the left and right of a ‘typical’ peak of mass m. The change in window width is due to the nature of time-of-flight apparatus where heavier ions will arrive later in time and show up with wider peak shape. This is a rough-and-ready compromise between the desire to include as much data as possible in the window to improve the sampling statistics, and the realization that peaks will overlap and that our peak shape model is probably not very good out on the tails of the peak.

Focus attention now on a particular window position. We label the window location, t₀ in such a way that if a mass peak appeared in the window with zero width then the likelihood function (defined momentarily) will be maximized at t₀. Because of the mass-dependent asymmetry in peak line shape, t₀ is certainly not at the leading or trailing edge of the window. It is also not located at the center of the window, nor at the maximum of the ‘target’ temporal peak shape, but lies somewhere between these two points at a position which is weakly dependent upon m. In this way, we correct for systematic errors in the estimation of the peak shape due to its mass-dependent asymmetry.

We define the target shape of the temporal peak to be x(t−t₀) which is then sampled at discrete time intervals, giving the N-component vector x=(x ₁ , . . . x _(N))≡(x(t ₁ −t ₀), . . . x(t _(N) −t ₀))  (6) This ‘slides’ with the window and for convenience is normalized to have unit area: $\begin{matrix} {{\sum\limits_{k = 1}^{N}x_{k}} = 1} & (7) \end{matrix}$ This will only introduce a constant correction for the peak amplitude computed later. We now assume that the observed signal s(t) within the window is given by s _(k) =ax _(k)+ξ_(k), k=1,2 . . . N  (8) where a is an unknown amplitude and ξ=(ξ₁, ξ₂, . . . ξ_(N)) is a random process of some appropriate type. If more than one peak is present in the window, this complicates the analysis because more parameters must be fit, but the logic is similar to what we describe below. For counting experiments, we do not observe the signal s(t) directly, but instead count how many times it crosses some detection threshold in a time interval Δt. This is because a single ion striking a micro-channel plate, or a channeltron, will initiate a cascade of electron emission that amplifies the original micro-pulse into one that is macroscopically detectable. The noise process will be Poisson with some appropriately chosen rate, while for analog experiments it is probably a combination of Poisson and Gaussian processes. For the moment, we leave the choice of noise model aside and simply assert that choosing a noise model is part of the model selection process, and choosing between them is part of the hypothesis testing we now describe.

Under the assumption that the ions counts at different times are independent, the probability of observing the particular count sequence n=(n₁, n₂, . . . n_(N)) is simply $\begin{matrix} {\begin{matrix} {{p\left( {{n\text{❘}a},\lambda,t_{0},M} \right)} = {p\left( {{\left\{ {n_{1},n_{2},{\ldots\quad n_{N}}} \right\}\text{❘}a},\lambda,t_{0},M} \right)}} \\ {= {\prod\limits_{k = 1}^{N}{p\left( {{n_{k}\text{❘}a},\lambda,t_{0},M} \right)}}} \end{matrix}\quad} & (9) \end{matrix}$ By the notation λ we indicate a parameter that characterizes the noise process (e.g. the variance σ for a Gaussian process or the ‘dark’ rate r₀ for a Poisson process). By M we mean to emphasize a particular choice of model class, including a peak shape x and a noise model. The probability (9) is the likelihood function of observing the data in the window labeled t₀, given the model class M and the parameters.

We first want to decide whether there is a peak in the window or not. This is a comparison between two hypotheses:

-   -   H₀=There is no peak in the widow t₀. The data are noise of the         assumed type. Let's call the associated pure-noise model M₀.     -   H₁=There is a single peak in the widow t₀ with the shape x.         Deviations from this shape in the data are due to noise of the         assumed type. Let's call the associated peak-plus-noise model         M₁.         We wish to compute the probability of which each of these         hypotheses was true, given the data, and then take the ratio         $\begin{matrix}         {\frac{p\left( {H_{1}\text{❘}n} \right)}{p\left( {H_{0}\text{❘}n} \right)} = \frac{p\left( {{M_{1}\text{❘}n},t_{0}} \right)}{p\left( {{M_{0}\text{❘}n},t_{0}} \right)}} & (10)         \end{matrix}$         If this ratio is large compared to one, we can be confident that         there is probably a peak in the window, while if it is         approximately one we interpret that as saying there is only weak         evidence of a peak in the window (because a=0 is a possible         estimate of the peak amplitude, which we interpret as ‘no         peak’). Therefore, one may set an appreciable threshold for peak         detection. In order to derive (10) from (9), we first invoke         Bayes' theorem which states: $\begin{matrix}         {{{p\left( {A\text{❘}B} \right)}{p(B)}} = {\left. {{p\left( {B\text{❘}A} \right)}{p(A)}}\Rightarrow{p\left( {A\text{❘}B} \right)} \right. = \frac{{p\left( {B\text{❘}A} \right)}{p(A)}}{p(B)}}} & (11)         \end{matrix}$         Therefore, identifying A as the model class M_(k), and B as the         observed data in the time window t₀: $\begin{matrix}         {{p\left( {{M_{k}\text{❘}n},t_{0}} \right)} = \frac{{p\left( {{n\text{❘}M_{k}},t_{0}} \right)}{p\left( {M_{k}\text{❘}t_{0}} \right)}}{p\left( {n\text{❘}t_{0}} \right)}} & (12)         \end{matrix}$         Since we are given that we observed the data n in the window t₀,         the denominator is unity. If we have no reason to prefer one         model class over another, we should assign them all equal prior         probabilities. For example, if we are comparing two types of         models (a single peak vs. no peak), then p(M₀|t₀)=p(M₁|t₀)=½.         Therefore, we have the simple result that $\begin{matrix}         {{p\left( {{M_{k}\text{❘}n},t_{0}} \right)} = {\frac{1}{2}{p\left( {{n\text{❘}M_{k}},t_{0}} \right)}}} & (13)         \end{matrix}$         Hence, we need to calculate the likelihood that we observed the         data given the model class, p(n|M_(k),t₀), a quantity called the         evidence. We get this by integrating the likelihood function (9)         over the model parameters using an appropriate prior for the         parameters:         p(n|M _(k) ,t ₀)=∫dadλp(n|a,λ,M _(k) ,t ₀)p(a,λ|M _(k) ,t         ₀)  (14)         where p(a, λ|M_(k), t₀) is prior distributions of parameters (a,         λ). For each model class, there will be a prior probability         distribution for the parameters. For example, when no peak is         present we choose as the prior: $\begin{matrix}         {{p\left( {a,{\lambda\text{❘}t_{0}},M_{0}} \right)} = {\frac{1}{2}{\delta(a)}{p\left( {{\lambda\text{❘}t_{0}},M_{0}} \right)}}} & (15)         \end{matrix}$         where the ½ factor appears because the δ-function will be         integrated only over the positive values of a. If we know         nothing about the values of λ, then we choose a uniform prior,         or some other prior that is very broad in λ-space on the grounds         that when we integrate against (9) only the neighborhood of the         maximum likelihood value of λ will contribute.

The likelihood of the data given the model (14) will be a bit less mysterious when we calculate for specific models momentarily. Before doing so, we summarize that to compare the hypotheses that there is, or is not, a peak in the window, we need to compute the ratio (10), which requires computation of (14) for each model. Setting the threshold for the level of evidence we require for ‘peak detection’ is a separate issue. Given that we have detected that a peak lies within a certain region of the time axis, we then fix the position and amplitude of the peak by maximizing the likelihood over the parameters (a, λ, t₀), as discussed in next section. Notice, however, that maximizing over t₀ has a different logical character than the other parameters, because we are comparing different data sets as we slide the window across the peak. The justification is based upon the physical reasonableness of the approach: the width of the window is large compared to the uncertainty in the position of the peak, hence near the maximum of the likelihood, most of the data being compared comes from overlapping windows.

1.3 Parameter Fitting

As the window sliding across the spectrum one point at a time, the ratio (10) is calculated for each window position. We can then justify in which region in the spectrum we are confident that there is a peak. We then look into particular regions of interest, fit parameters (a, λ) by maximizing equation (9), i.e. solve the following equations, for each window t₀: $\begin{matrix} {{\frac{\partial L}{\partial a} = 0}{\frac{\partial L}{\partial\lambda} = 0}} & (16) \end{matrix}$ where L(t₀), the natural logarithm of likelihood function (9): L(t ₀)=1n(p(n|a,λ,M _(k) ,t ₀)  (17)

Solving equations (16) gives the maximum likelihood estimations of parameters (a*, λ*). If the data is very informative, then the likelihood would sharply peak around the point (a*, λ*) in the parameter space. It is natural to Taylor expand (17) around (a*, λ*): $\begin{matrix} {\begin{matrix} {L \approx {{L\left( {a^{*},\lambda^{*}} \right)} + {\frac{1}{2}\begin{bmatrix} \left. \frac{\quad{\partial^{2}\quad L}}{\partial\quad a^{\quad 2}} \middle| {}_{\quad{\quad{a^{*},\quad\lambda^{*}}}}{\left( {a\quad - \quad a^{*}} \right)^{2} + \frac{\quad{\partial^{2}\quad L}}{{\partial a}\quad{\partial\lambda}}} \right|_{\quad{\quad{a^{*},\quad\lambda^{*}}}} \\ \left. {{\left( {a - a^{*}} \right)\left( {\lambda - \lambda^{*}} \right)} + \frac{\quad{\partial^{2}\quad L}}{\partial\quad\lambda^{\quad 2}}} \middle| {}_{\quad{a^{*},\quad\lambda^{\quad*}}}\left( {\lambda - \lambda^{*}} \right)^{2} \right. \end{bmatrix}}}} \\ {= {{L\left( {a^{*},\lambda^{*}} \right)} + {\frac{1}{2}\left( {X - X^{*}} \right)^{\prime}{\nabla{\nabla{L\left( {a^{*},\lambda^{*}} \right)}}}\left( {X - X^{*}} \right)}}} \end{matrix}{{{{where}\quad X} = \begin{bmatrix} a \\ \lambda \end{bmatrix}},\quad{and}}} & (18) \\ {\nabla{\nabla{L\begin{bmatrix} \frac{\partial^{2}L}{\partial a^{2}} & \frac{\partial^{2}L}{{\partial a}{\partial\lambda}} \\ \frac{\partial^{2}L}{{\partial a}{\partial\lambda}} & \frac{\partial^{2}L}{\partial\lambda^{2}} \end{bmatrix}}}} & (19) \end{matrix}$ is the Hessian matrix. It follows from (18) that the likelihood function (9) is approximately: $\begin{matrix} \begin{matrix} {{p\left( {\left. n \middle| a \right.,\lambda,M,t_{0}} \right)} = {\exp\left\lbrack {L\left( t_{0} \right)} \right\rbrack}} \\ {\approx {\exp\left( {{L\left( {a^{*},\lambda^{*}} \right)} + {\frac{1}{2}\left( {X - X^{*}} \right)^{\prime}{\nabla{\nabla{L\left( {a^{*},\lambda^{*}} \right)}}}\left( {X - X^{*}} \right)}} \right)}} \\ {= {{\mathbb{e}}^{L{({a^{*},\lambda^{*}})}}{\mathbb{e}}^{\frac{1}{2}{({X - X^{*}})}^{\prime}{\nabla{\nabla{L{({a^{*},\lambda^{*}})}}}}{({X - X^{*}})}}}} \end{matrix} & (20) \end{matrix}$ This implies that the likelihood function (9) looks like a multivariate normal distribution in parameter space, with uncertainties: $\begin{matrix} {{\sigma_{a} = \left( \left. {- \frac{\partial^{2}L}{\partial a^{2}}} \right|_{a^{*},\lambda^{*}} \right)^{{- 1}/2}}{\sigma_{\lambda} = \left( \left. {- \frac{\partial^{2}L}{\partial\lambda^{2}}} \right|_{a^{*},\lambda^{*}} \right)^{{- 1}/2}}} & (21) \end{matrix}$

Moreover, when compute the evidence, i.e. marginalize likelihood function (9) over prior p(a,λ|M_(k),t₀), as seen in equation (14), if the prior is impendent of (a, λ), for example, a and λ are uniformly distributed in some region (a_(min), a_(max)) and (λ_(min), λ_(max)), substitute (20) into (14) results: $\begin{matrix} \begin{matrix} {{p\left( {\left. n \middle| M_{\quad k} \right.,t_{\quad 0}} \right)} = {\int{{\mathbb{d}a}{\mathbb{d}\lambda}\quad{p\left( {\left. n \middle| a \right.,\lambda,M_{\quad k},t_{\quad 0}} \right)}{p\left( {a,\left. \lambda \middle| M_{\quad k} \right.,t_{\quad 0}} \right)}}}} \\ {= {\int_{\quad a_{\quad\min}}^{\quad a_{\quad\max}}{\int_{\quad\lambda_{\quad\min}}^{\quad\lambda_{\quad\max}}\quad{{\mathbb{d}a}\quad{\mathbb{d}{\lambda\mathbb{e}}^{\quad{L{(\quad{a^{*},\quad\lambda^{*}})}}}}}}}} \\ {{\mathbb{e}}^{\quad{\frac{1}{\quad 2}\quad{({X\quad - \quad X^{*}})}^{\prime}\quad{\nabla{\nabla L}}\quad{(\quad{a^{*},\quad\lambda^{*}})}\quad{({X\quad - \quad X^{*}})}}}} \\ {\frac{1}{a_{\max} - a_{\min}}\frac{1}{\lambda_{\max} - \lambda_{\min}}} \\ {= {\frac{1}{a_{\max} - a_{\min}}\frac{1}{\lambda_{\max} - \lambda_{\min}}{\mathbb{e}}^{L{({a^{*},\lambda^{*}})}}\frac{\left( {2\pi} \right)^{m/2}}{\sqrt{\det\left\lbrack {\nabla{\nabla{L\left( {a^{*},\lambda^{*}} \right)}}} \right\rbrack}}}} \end{matrix} & (22) \end{matrix}$ where m is the dimension of parameter space. In the last step of integration, lower and upper boundaries of integration are extended to infinity. This is doable on the basis that (a_(min), a_(max)) and (λ_(min), λ_(max)) are large enough such that contributions from out side these regions are negligible. Note det[∇∇L(a*,λ*)] is the determinate of Hessian matrix evaluated at (a*, λ*), and 1/det[∇∇L(a*,λ*)] is proportion to the volume within σ_(a) and σ_(λ) around (a*, λ*) in parameter space.

2. Finding a Peak in a Time-of-Flight Mass Spectrum.

We now consider some specific examples, following the discussion above. The first concerns the detection of a peak in white Gaussian noise. We show that the approach described above leads to what is called the ‘matched filter’ in the engineering literature. After that, we move on the applications in ‘counting’ problems, where the noise is due to the discreteness of the underlying process and a Poisson noise model is more appropriate. The later case is of particular interest as it is related to TOF-SIMS (Time of Flight Secondary Ionization Mass Spectrometer). The Gaussian noise case is also important as it may relate to other mass spectrometer, such as SELDI (Surface Enhanced Laser Desorption Ionization). In this section, we do not specify the peak shape for both Gaussian noise case and Poisson noise case, we simply use x_(i) to denote peak shape in general. In section 3, before we present some preliminary results, we will choose a particular peak shape, motivated by the kinematics of TOF-MS instruments.

In the end of this paper, we consider the effects of ‘color’ in the noise, and show that this can easily be incorporated if we know what the correlation structure of the noise is. If we have to estimate it from the data this becomes a difficult problem because of the number of parameters that need to be fit.

Before we go to detailed calculations for Gaussian and Poisson noise cases, let us summarize that, in order to compare model M₀ v.s M₁, we need to evaluate the ratio (10), in which we need to compute (14) by marginalizing (9) over prior distribution p(a,λ|M_(k),t₀)). Parameters are fitted by maximizing the likelihood function (9), via solving equation (16).

For convenience, from now on, except for t₀, when we refer to parameters associated with model M₀, a subscript 0 will be used, and a subscript 1 will be used for parameters associated with model M₁. For both models, a superscript of star * will be used for best estimations of parameters.

2.1 Some Preliminary Results for Gaussian Noise Models

2.1.1 Model Comparison

We assume the noise process is Gaussian and white: $\begin{matrix} {{\left\langle {\eta\left( t_{j} \right)} \right\rangle = \mu},{\left\langle {{\eta\left( t_{j} \right)}{\eta\left( t_{k} \right)}} \right\rangle = \delta_{jk}},{{p_{1}\left( {\eta\left( t_{k} \right)} \right)} = {\frac{1}{\sqrt{2\pi}\sigma_{\eta}}{\exp\left( {- \frac{\left( {{\eta\left( t_{k} \right)} - \mu} \right)^{2}}{2\sigma_{\eta}^{2}}} \right)}}}} & (23) \end{matrix}$ wherein the subscript on the probability density indicates that this is the PDF for a single time step. We use this single-step PDF to compute the joint probability that we would have observed the N-step noise sequence η(t_(k)), k=1, 2 . . . N.

We want to compare the models having a peak in the window (M₁) to those lacking a peak in the window (M₀). Therefore, we need to evaluate (14) for both M₁ and M₀.

We first assume there is no peak present, only noise, and that each time step is independent of the others: $\begin{matrix} {{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta\quad 0},M_{0}} \right)} = {\prod\limits_{j = 1}^{N}{\frac{1}{\sqrt{2\pi}\sigma_{\eta\quad 0}}{{\exp\left( {{- \frac{1}{2\sigma_{\eta\quad 0}^{2}}}\left( {{\eta\left( t_{j} \right)} - \mu_{0}} \right)^{2}} \right)}.}}}} & (24) \end{matrix}$ The notation η in the argument of the likelihood function represents an N-dimensional vector of noise values. Collecting terms, we find $\begin{matrix} {{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta\quad 0},M_{0}} \right)} = {\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta\quad 0}^{N}}{\exp\left( {{- \frac{1}{2\sigma_{\eta\quad 0}^{2}}}{\sum\limits_{j = 1}^{N}\left( {{\eta\left( t_{j} \right)} - \mu_{0}} \right)^{2}}} \right)}}} & (25) \end{matrix}$ The exponent is a quadratic function of μ₀, therefore we complete the square and write it as $\begin{matrix} {{{\sum\limits_{j = 1}^{N}\left( {{\eta\left( t_{j} \right)} - \mu_{0}} \right)^{2}} = {{N\left( {\mu_{0} - \mu_{*}} \right)}^{2} + {N\quad\sigma_{*}^{2}}}}{where}} & (26) \\ {{\mu_{*} = {\frac{1}{N}{\sum\limits_{j = 1}^{N}\eta_{j}}}},\quad{{{and}\quad\sigma_{*}^{2}} = {\frac{1}{N}{\sum\limits_{j = 1}^{N}\left( {\eta_{j} - \mu_{*}} \right)^{2}}}}} & (27) \end{matrix}$ The likelihood function now becomes: $\begin{matrix} {{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta\quad 0},M_{0}} \right)} = {\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta\quad 0}^{N}}{\exp\left( {- \frac{N\quad\sigma_{*}^{2}}{2\sigma_{\eta\quad 0}^{2}}} \right)}{\exp\left( {- \frac{{N\left( {\mu_{0} - \mu_{*}} \right)}^{2}}{2\sigma_{\eta\quad 0}^{2}}} \right)}}} & (28) \end{matrix}$ From this form it is clear that the likelihood is maximized if we choose μ₀*=μ_(*) and σ_(η0)*=σ_(*), with maximum: $\begin{matrix} {{{p_{N}^{*}(\eta)} \equiv {p_{N}^{*}\left( {\left. \eta \middle| \mu_{0}^{*} \right.,\sigma_{\eta\quad 0}^{*},M_{0}} \right)}} = {\frac{1}{\left( {2\pi\quad e\quad\sigma_{\eta\quad 0}^{*2}} \right)^{N/2}}.}} & (29) \end{matrix}$ Comparing this value for different window positions can be used to measure the likelihood that the signal in each window is a Gaussian white noise process. Comparing different models using those parameter values in each model that maximizse the likelihood may not the most robust way to proceed. Instead, it may be preferable to marginalize over all possible values of the model parameters (in this case μ₀ and σ_(η0)) to get the probability for the model that there is no peak in the window (M₀) given the signal observed in the window, which is parameter-free. Doing the same marginalization for the case where a peak is assumed in the window, gives a parameter free probability for model M₁. These two parameter-free probabilities are directly comparable.

In evaluating equation (14), we now assume that we know nothing about the prior probabilities for the mean and variance of the noise process, aside from the requirement that they be probabilities in the parameters μ₀ and σ_(η0). The simplest choice is to assume that they are constant over some range. Marginalizing, we have: $\begin{matrix} {{p_{N}\left( \eta \middle| M_{0} \right)} = {\int_{0}^{\infty}\quad{\frac{\mathbb{d}\sigma_{\eta\quad 0}}{\sigma_{\eta\quad 0_{\max}}}{\int_{- \infty}^{\infty}\quad{\frac{\mathbb{d}\mu_{0}}{\mu_{0_{\max}}}{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta\quad 0},M_{0}} \right)}}}}}} & (30) \end{matrix}$ Performing the integral over the parameter μ₀ first we are left with $\begin{matrix} {{p_{N}\left( \eta \middle| M_{0} \right)} = {\frac{1}{\sigma_{\eta\quad 0_{\max}}\mu_{0_{\max}}}\frac{1}{\left( {2\pi} \right)^{{({N - 1})}/2}\sqrt{N}}{\int_{0}^{\infty}\quad{\frac{\mathbb{d}\sigma_{\eta\quad 0}}{\sigma_{\eta\quad 0}^{N - 1}}{\exp\left( {- \frac{N\quad\sigma_{\eta\quad 0}^{*2}}{2\sigma_{\eta 0}^{2}}} \right)}}}}} & (31) \end{matrix}$ Changing variables to $\begin{matrix} {t = \frac{N\quad\sigma_{\eta\quad 0}^{*2}}{2\sigma_{\eta\quad 0}^{2}}} & (32) \end{matrix}$ we find $\begin{matrix} {{p_{N}\left( \eta \middle| M_{0} \right)} = {\frac{1}{2\sqrt{2}\pi^{{({N - 1})}/2}}\frac{\Gamma\left( {\frac{N}{2} - 1} \right)}{N^{\frac{N - 1}{2}}\sigma_{\eta\quad 0}^{{*N} - 2}}\frac{1}{\sigma_{\eta\quad 0_{\max}}\mu_{0_{\max}}}}} & (33) \end{matrix}$ Using Stirling's asymptotic series for the Γ-function (Γ(x+1)≈√{square root over (2πx)}x^(x)e^(−x)(1+ . . . )) $\begin{matrix} \begin{matrix} {{\Gamma\left( {\frac{N}{2} - 1} \right)} \approx {{{\mathbb{e}}^{{- \frac{N}{2}} + 1}\left( {\frac{N}{2} - 1} \right)}^{\frac{N}{2} - 1 - \frac{1}{2}}\sqrt{2\pi}}} \\ {\approx {{{\mathbb{e}}^{{- \frac{N}{2}} + 1}\left( \frac{N}{2} \right)}^{\frac{N - 3}{2}}\sqrt{2\pi}}} \end{matrix} & (34) \end{matrix}$ we find: $\begin{matrix} {{p_{N}\left( \eta \middle| M_{0} \right)} = {{\frac{1}{\left( {2{\pi\mathbb{e}}\quad\sigma_{\eta\quad 0}^{*2}} \right)^{N/2}}\frac{\pi\sqrt{2}\sigma_{\eta\quad 0}^{*2}}{N}} = {p_{N}^{*}\frac{\pi\sqrt{2}\sigma_{\eta\quad 0}^{*2}}{N}\frac{\mathbb{e}}{\sigma_{\max}\mu_{\max}}}}} & (35) \end{matrix}$ Notice that this still depends upon the window position. This result could have been anticipated by the following argument: the integration of the likelihood over μ₀ and σ_(η0) will be dominated by the region of μ₀σ_(η0)-space in the immediate neighborhood of the maximum likelihood values μ₀* and σ_(η0)*. Over most of parameter space the likelihood is vanishingly small (if the data is ‘informative’). The volume of space over which the likelihood is not vanishingly small is given by the product of the half-widths of the likelihood function in each direction (which are essentially the uncertainties in the parameter estimates). The uncertainties can be estimated from the second derivative of the ln of the likelihood function, and are of the order of Δμ₀□σ_(η0)*/√{square root over (N)} and Δσ_(η0)□σ_(η0)*/√{square root over (2N)}. Therefore, we have the appealing result that (35) is simply $\begin{matrix} {{p_{N}\left( \eta \middle| M_{0} \right)} = {2\pi\quad{p_{N}^{*}\left( \frac{{\Delta\mu}_{0}}{\mu_{0_{\max}}} \right)}\left( \frac{{\Delta\sigma}_{\eta\quad 0}}{\sigma_{\eta\quad 0_{\max}}} \right){\mathbb{e}}}} & (36) \end{matrix}$ The results (35) and (29) depend only upon the noise model and are independent of the rest of the model (e.g. peak shapes and positions, if there are any). Equation (33), (35) and (36) are different expressions for evidence.

The computation for p_(N)(η|M₀) is complete. However, we want to point out that there is a way that arrives at (35) and (36) while avoiding above tedious integrals. That is to utilize (22) and (29). The Hessian matrix, as defined in (19), can be easily calculated in case: $\begin{matrix} {{\nabla{\nabla{L\left( {\mu_{0}^{*},\sigma_{\eta\quad 0}^{*}} \right)}}} = \begin{bmatrix} \frac{N}{\sigma_{\eta\quad 0}^{*2}} & 0 \\ 0 & \frac{2N}{\sigma_{\eta\quad 0}^{*2}} \end{bmatrix}} & (37) \end{matrix}$

Substitute (29) and (37) into (22): $\begin{matrix} \begin{matrix} {{p_{N}^{\prime}\left( \eta \middle| M_{0} \right)} \approx {\frac{1}{\mu_{0_{\max}}}\frac{1}{\sigma_{\eta\quad 0_{\max}}}\frac{2{\pi\sigma}_{\eta\quad 0}^{*2}}{\sqrt{2}N}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta\quad 0}^{*N}}{\mathbb{e}}^{{- N}/2}}} \\ {= {\frac{1}{\mu_{0_{\max}}}\frac{1}{\sigma_{\eta\quad 0_{\max}}}\frac{\sigma_{\eta\quad 0}^{*}}{\sqrt{N}}\frac{\sigma_{\eta\quad 0}^{*}}{\sqrt{2N}}2\pi\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta\quad 0}^{*N}}{\mathbb{e}}^{{- N}/2}}} \\ {= {\frac{1}{\mu_{0_{\max}}}\frac{1}{\sigma_{\eta\quad 0_{\max}}}\frac{\sigma_{\eta\quad 0}^{*}}{\sqrt{N}}\frac{\sigma_{\eta\quad 0}^{*}}{\sqrt{2N}}2\pi\quad{p_{N}^{*}\left( {\left. \eta \middle| \mu_{0}^{*} \right.,\sigma_{\eta\quad 0}^{*},M_{0}} \right)}}} \\ {= {\frac{{\Delta\mu}_{0}}{\mu_{0_{\max}}}\frac{{\Delta\sigma}_{\eta\quad 0}}{\sigma_{\eta\quad 0_{\max}}}2\pi\quad{p_{N}^{*}\left( {\left. \eta \middle| \mu_{0}^{*} \right.,\sigma_{\eta\quad 0}^{*},M_{0}} \right)}}} \end{matrix} & (38) \end{matrix}$

Comparing (38) with (36), they differ by a factor e which comes from Stirling series expansion of $\Gamma\left( {\frac{N}{2} - 1} \right)$ in (34).

We now consider there is a peak in the window. Suppose the peak has a line shape described by x_(i), which peaks near the center of the window. The observed data within the window may be written as: s(t _(j))=a ₁ x(t _(j) −t ₀)+η(t _(j)) j=1 . . . N  (39) where the amplitude (a₁) and arrival time (t₀) of the wavelet, as well as the noise amplitude σ_(η1) and mean μ₁, are assumed to be unknown. The marginalization in this case looks like: $\begin{matrix} \begin{matrix} {{p\left( \eta \middle| M_{1} \right)} = {\int{\int{\int{{\mathbb{d}a_{1}}{\mathbb{d}\mu_{1}}{\mathbb{d}\sigma_{\eta\quad 1}}{p_{N}\left( {\left. \eta \middle| a_{1} \right.,\mu_{1},\sigma_{\eta 1},t_{0},M_{1}} \right)}}}}}} \\ {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}} \end{matrix} & (40) \end{matrix}$ where: $\begin{matrix} \begin{matrix} {{p_{N}\left( {\left. \eta \middle| a_{1} \right.,\mu_{1},\sigma_{\eta 1},t_{0},M_{1}} \right)} = {\frac{1}{\left( {\sqrt{2\pi}\sigma_{\eta 1}} \right)^{N}}\exp}} \\ {\left( {{- \frac{1}{2\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}x_{i}} - u_{1}} \right)^{2}}} \right)} \end{matrix} & (41) \end{matrix}$ is the realization of likelihood function (9). As in the case of no peak, complete the square, let: $u_{*} = {\frac{1}{N}{\sum\left( {s_{i} - {a_{1}x_{i}}} \right)}}$ $\sigma_{\eta^{*}}^{2} = {\frac{1}{N}{\sum\left( {s_{i} - {a_{1}x_{i}} - u_{*}} \right)^{2}}}$ one may get: $\begin{matrix} \begin{matrix} {{p\left( \eta \middle| M_{1} \right)} = {\int{\int{{\mathbb{d}a_{1}}{\mathbb{d}\mu_{1}}{\mathbb{d}\sigma_{\eta 1}}\frac{1}{\left( {\sqrt{2\pi}\sigma_{\eta 1}} \right)^{N}}}}}} \\ {{\mathbb{e}}^{{- \frac{1}{2\sigma_{\eta 1}^{2}}}{N{({\mu_{1} - \mu_{*}})}}^{2}}{\mathbb{e}}^{{- \frac{1}{2\quad\sigma_{\eta 1}^{\quad 2}}}N\quad\sigma_{\eta^{*}}^{2}}} \\ {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}} \\ {= {\frac{1}{\left( \sqrt{2\pi} \right)^{N - 1}\sqrt{N}}{\int{{\mathbb{d}a_{1}}{\mathbb{d}\sigma_{\eta 1}}\frac{1}{\sigma_{\eta 1}^{N - 1}}{\mathbb{e}}^{{- \frac{1}{2\sigma_{\eta 1}^{2}}}N\quad\sigma_{\eta^{*}}^{2}}}}}} \\ {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}} \\ {= {\int{{\mathbb{d}a_{1}}\frac{1}{2\sqrt{2}\left( {N\quad\pi} \right)^{\frac{N - 1}{2}}\sigma_{\eta^{*}}^{N - 2}}{\Gamma\left( {\frac{N}{2} - 1} \right)}}}} \\ {\frac{1}{a_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{1}{\mu_{1_{\max}}}} \end{matrix} & (42) \end{matrix}$ Note, if we let (a₁*, μ₁*, σ_(η1)*) be the point where the likelihood is maximized, the derivation of (a₁*, μ₁*, σ_(η1)*) will be given momentarily, then $\begin{matrix} \begin{matrix} {\sigma_{\eta^{*}}^{2} = {\frac{1}{N}{\sum\left( {s_{i} - {a_{1}x_{i}} - u_{*}} \right)^{2}}}} \\ {= {\frac{1}{N}{\sum\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*} - {\left( {a_{1} - a_{1}^{*}} \right)x_{i}} - \left( {u_{*} - u_{1}^{*}} \right)} \right)^{2}}}} \\ {= {\sigma_{\eta 1}^{*} + {\sigma_{x}^{2}\left\lbrack {a_{1} - a_{1}^{*} - \frac{\sum\quad{\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}}{\sigma_{x}^{2}N}} \right\rbrack}^{2} -}} \\ {\frac{\left\lbrack {\sum\quad{\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}} \right\rbrack^{2}}{\sigma_{x}^{2}N^{2}}} \\ {= {{A\left( {a - a_{1}^{\prime}} \right)}^{2} + B}} \end{matrix} & (43) \end{matrix}$ where $\begin{matrix} {{\overset{\_}{x} = \frac{\sum x_{i}}{N}}{\sigma_{x}^{2} = {\frac{1}{N}{\sum\left( {x_{i} - \overset{\_}{x}} \right)^{2}}}}{A = \sigma_{x}^{2}}{B = {\sigma_{\eta 1}^{*2} - \frac{\left\lbrack {\sum\quad{\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}} \right\rbrack^{2}}{\sigma_{x}^{2}N^{2}}}}{a_{1}^{\prime} = {a_{1}^{*} + \frac{\sum\quad{\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}}{\sigma_{x}^{2}N}}}} & (44) \end{matrix}$ then marginalization may be written as: $\begin{matrix} \begin{matrix} {{p\left( \eta \middle| M_{1} \right)} = {\int{{\mathbb{d}a}\frac{1}{2\sqrt{2}{\left( {N\quad\pi} \right)^{\frac{N - 1}{2}}\left\lbrack {{A\left( {a - a_{1}^{\prime}} \right)}^{2} + B} \right\rbrack}^{\frac{N - 2}{2}}}}}} \\ {{\Gamma\left( {\frac{N}{2} - 1} \right)}\frac{1}{a_{\max}}\frac{1}{\sigma_{\eta^{\max}}}\frac{1}{\mu_{\max}}} \end{matrix} & (45) \end{matrix}$ Changing variable t=(a₁−a₁′)² gives: $\begin{matrix} {{p\left( \eta \middle| M_{1} \right)} = {\frac{1}{4\sqrt{2}\pi^{\frac{N}{2} - 1}N^{\frac{N - 1}{2}}}\frac{B^{{- \frac{N}{2}} + \frac{3}{2}}}{\sqrt{A}}{\Gamma\left( {\frac{N}{2} - \frac{3}{2}} \right)}}} & (46) \end{matrix}$ Now, having p(η|M₀) and p(η|M₁) computed, one may substitute them into (10) to see which model is more preferable.

Up to this point, the model comparison is complete, as we derived (33) and (46), and they can substitute into (10) to compare two hypothesis. Yet, as in parallel with what we did in M₀ case, we can also arrive at an equation similar to (46) via computing the Hessian matrix, but this is going to need (a₁*, μ₁*, σ_(η1)*), the best estimation of (a₁, μ₁*, σ_(η1)). So, let's temporarily leave here and do parameter fitting first.

2.1.2 Parameter Fitting for Gaussian Noise

Once it is determined which model is more appealing, parameters may be fitted via maximizing appropriate likelihood.

For the case where there is no peak in the window, the likelihood (9) is simply: $\begin{matrix} \begin{matrix} {{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta 0},M_{0}} \right)} = {\frac{1\quad}{\left( {2\quad\pi} \right)^{N/2}\quad\sigma_{\eta 0}^{N}}\quad\exp}} \\ {\left( {{- \frac{1}{2\quad\sigma_{\eta 0}^{2}}}\quad{\sum\limits_{j = 1}^{N}\left( {{\eta\left( t_{j} \right)} - \mu_{0}} \right)^{2}}} \right)} \end{matrix} & (47) \end{matrix}$ where η(t _(j))=s(t _(j))  (48) as only noise is assumed in presence. As it is pointed out in previous section, (47) is maximized at $\begin{matrix} {{\mu_{0}^{*} = {\frac{1}{N}{\sum s_{i}}}}{\sigma_{\eta 0}^{*} = {\frac{1}{N}{\sum\left( {s_{i} - \mu_{0}} \right)^{2}}}}} & (49) \end{matrix}$

The uncertainties are, as one can see from the Hessian matrix (37): $\begin{matrix} {{{\Delta\mu}_{0} = \frac{\sigma_{\eta 0}^{*}}{\sqrt{N}}}{{\Delta\sigma}_{\eta 0} = \frac{\sigma_{\eta 0}^{*}}{\sqrt{2N}}}} & (50) \end{matrix}$

If it is preferred that there is a peak in the window, then the likelihood function (9) looks like: $\begin{matrix} \begin{matrix} {{p_{N}\left( {\left. \eta \middle| a_{1} \right.,\mu_{1},\sigma_{\eta 1},M_{1}} \right)} = {\frac{1}{\left( {\sqrt{2\pi}\sigma_{\eta 1}} \right)^{N}}{\exp\left( {{- \frac{1}{2\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}\eta_{i}^{2}}} \right)}}} \\ {= {\frac{1}{\left( {\sqrt{2\pi}\sigma_{\eta 1}} \right)^{N}}\exp}} \\ {\left( {{- \frac{1}{2\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}x_{i}} - u_{1}} \right)^{2}}} \right)} \end{matrix} & (51) \end{matrix}$ the natural logarithm of the likelihood is: $\begin{matrix} \begin{matrix} {L = {\log\quad\left\lbrack {p_{N}\left( {\left. \eta \middle| a_{1} \right.,\mu_{1},\sigma_{\eta 1},M_{1}} \right)} \right\rbrack}} \\ {= {{{- \frac{N}{2}}\log\quad\left( {2\pi} \right)} - {N\quad{\log\left( \sigma_{\eta 1} \right)}} - {\frac{1}{2\sigma_{\eta 1}^{2}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}x_{i}} - \mu_{1}} \right)^{2}}}}} \end{matrix} & (52) \end{matrix}$ In order to maximize this value, let: $\begin{matrix} {{\frac{\partial L}{\partial a_{1}} = {{{- \frac{1}{\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}x_{i}} - \mu_{1}} \right)\left( {- x_{i}} \right)}}} = 0}}{\frac{\partial L}{\partial\mu_{1}} = {{{- \frac{1}{\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}x_{i}} - \mu_{1}} \right)\left( {- 1} \right)}}} = 0}}{\frac{\partial L}{\partial\sigma_{\eta 1}} = {{{- \frac{N}{\sigma_{\eta 1}}} + {\frac{1}{\sigma_{\eta 1}^{3}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}x_{i}} - \mu_{1}} \right)^{2}}}} = 0}}} & (53) \end{matrix}$ Solving these equations gives: $\begin{matrix} {{\begin{matrix} {a_{1}^{*} = \frac{\overset{\_}{({sx})} - {\overset{\_}{s}\overset{\_}{x}}}{x^{2} - {\overset{\_}{x}}^{2}}} \\ {= \frac{\overset{\_}{\left( {s - \overset{\_}{s}} \right)\left( {x - \overset{\_}{x}} \right)}}{\overset{\_}{\left( {x - \overset{\_}{x}} \right)^{2}}}} \\ {= \frac{\overset{\_}{\left( {s - \overset{\_}{s}} \right)\left( {x - \overset{\_}{x}} \right)}}{\sigma_{x}^{2}}} \end{matrix}\mu_{1}^{*} = {\overset{\_}{s} - {a_{1}^{*}\overset{\_}{x}}}}{\sigma_{\eta 1}^{*} = {\overset{\_}{\left( {s_{i} - \overset{\_}{s} - {a_{1}\left( {x_{i} - \overset{\_}{x}} \right)}} \right)^{2}} = {\sigma_{s}^{2} - {a_{1}^{2}\sigma_{x}^{2}}}}}} & (54) \end{matrix}$

The log of the likelihood function at (a₁*, μ₁*, σ_(η1)*) is simply: $\begin{matrix} {{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}} \right)} = {{{- N}\quad{\log\left( \sigma_{\eta 1}^{*} \right)}} - {\frac{N}{2}{\log\left( {2\pi} \right)}} - \frac{N}{2}}} & (55) \end{matrix}$

Now let come back to the issue we left before parameter fitting.

-   -   The elements in Hessian matrix ∇∇L(a₁,μ₁,σ_(η1)) evaluated at         (a₁*, μ₁*, σ_(η1)*) are: $\begin{matrix}         {{\left. {- \frac{\partial^{2}L}{\partial a_{1}^{2}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{\frac{1}{\sigma_{\eta 1}^{*2}}{\sum\limits_{i = 1}^{N}x_{i}^{2}}} = {\left. {\frac{N\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}{\sigma_{\eta 1}^{*2}} - \frac{\partial^{2}L}{\partial\mu_{1}^{2}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{\frac{1}{\sigma_{\eta 1}^{*2}}{\sum\limits_{i = 1}^{N}(1)}} = \frac{N}{\sigma_{\eta 1}^{*2}}}}}}{\left. {\begin{matrix}         {\left. {- \frac{\partial^{2}L}{\partial\sigma_{\eta 1}^{2}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{- \frac{N}{\sigma_{\eta 1}^{*2}}} + {\frac{3}{\sigma_{\eta 1}^{*4}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}^{*}x_{i}} - \mu_{1}^{*}} \right)^{2}}}}} \\         {= \frac{2N}{\sigma_{\eta 1}^{*2}}}         \end{matrix} - \frac{\partial^{2}L}{{\partial a_{1}}{\partial\mu_{1}}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {\left. {\frac{\sum\limits_{i = 1}^{N}x_{i}}{\sigma_{\eta 1}^{*2}} - \frac{\partial^{2}L}{{\partial\mu_{1}}{\partial\sigma_{\eta 1}}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{\frac{2}{\sigma_{\eta 1}^{*3}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}^{*}x_{i}} - \mu_{1}^{*}} \right)}} = {\left. {0 - \frac{\partial^{2}L}{{\partial a_{1}}{\partial\sigma_{\eta 1}}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{\frac{2}{\sigma_{\eta 1}^{*3}}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}^{*}x_{i}} - \mu_{1}^{*}} \right)x_{i}}}} = 0}}}}}} & (56)         \end{matrix}$

The last 2^(nd) term is zero obviously because μ₁*={overscore (s)}−a₁*{overscore (x)}. The last term is also zero because it is proportional to the correlation between the residue, which is presumably white noise with zero mean, and the target x_(i), which is zero. Nevertheless, it can be shown by following: $\begin{matrix} \begin{matrix} {{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}^{*}x_{i}} - \mu_{1}^{*}} \right)x_{i}}} = {\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}^{*}x_{i}} - \overset{\_}{s} + {a_{1}^{*}\overset{\_}{x}}} \right)x_{i}}}} \\ {= {{\sum\limits_{i = 1}^{N}{s_{i}x_{i}}} - {a_{1}^{*}{\sum\limits_{i = 1}^{N}x_{i}^{2}}} -}} \\ {{\overset{\_}{s}{\sum\limits_{i = 1}^{N}x_{i}}} + {a_{1}^{*}\overset{\_}{x}{\sum\limits_{i = 1}^{N}x_{i}}}} \\ {= {N\left( {\overset{\_}{sx} - {a_{1}^{*}\overset{\_}{x^{2}}} - {\overset{\_}{s}\overset{\_}{x}} + {a_{1}^{*}{\overset{\_}{x}}^{2}}} \right)}} \\ {= {N\left( {\overset{\_}{sx} - {\overset{\_}{s}\overset{\_}{x}} - {a_{1}^{*}\left( {\overset{\_}{x^{2}} - {\overset{\_}{x}}^{2}} \right)}} \right)}} \\ {= {N\left( {\overset{\_}{\left( {s - \overset{\_}{s}} \right)\left( {x - \overset{\_}{x}} \right)} - {a_{1}^{*}\sigma_{x}^{2}}} \right)}} \\ {= 0} \end{matrix} & (57) \end{matrix}$

With all elements of Hessian matrix computed, we can now write out the Hessian matrix ∇∇L(a₁*,μ₁*,σ_(η1)*): $\begin{matrix} {{\nabla{\nabla{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}} \right)}}} = {\begin{bmatrix} \frac{N\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}{\sigma_{\eta 1}^{*2}} & \frac{\sum\limits_{i = 1}^{N}x_{i}}{\sigma_{\eta 1}^{*2}} & 0 \\ \frac{\sum\limits_{i = 1}^{N}x_{i}}{\sigma_{\eta 1}^{*2}} & \frac{N}{\sigma_{\eta 1}^{*2}} & 0 \\ 0 & 0 & \frac{2N}{\sigma_{\eta 1}^{*2}} \end{bmatrix}.}} & (58) \end{matrix}$

From which, one can get the uncertainties: $\begin{matrix} {{{\Delta\quad a_{1}} = \frac{\sigma_{\quad{\eta 1}}^{*}}{\sqrt{N\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}}}{{\Delta\quad\mu_{1}} = \frac{\sigma_{\eta 1}^{*}}{\sqrt{N}}}{{{\Delta\quad\sigma_{\eta 1}} = \frac{\sigma_{\eta 1}^{*}}{\sqrt{2N}}},}} & (59) \end{matrix}$

From (22), we have: $\begin{matrix} \begin{matrix} {{p^{\prime}\left( \eta \middle| M_{1} \right)} \approx {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}} \\ {\frac{\left( {2\pi} \right)^{3/2}}{\sqrt{\text{det}\quad\left( {\nabla{\nabla{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta\quad 1}^{*}} \right)}}} \right)}}{\mathbb{e}}^{L{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta\quad 1}^{*}})}}} \end{matrix} & (60) \end{matrix}$

in which $\begin{matrix} \begin{matrix} {{\det\left\lbrack {\nabla{\nabla{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta\quad 1}^{*}} \right)}}} \right\rbrack} = {\frac{2{N^{3}\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}}{\sigma_{\eta 1}^{*6}} - \frac{2{N\left( {\sum\limits_{i = 1}^{N}x_{i}} \right)}^{2}}{\sigma_{\eta 1}^{*6}}}} \\ {= \frac{{2{N^{3}\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}} - {2N^{3}{\overset{\_}{x}}^{2}}}{\sigma_{\eta 1}^{*6}}} \\ {= \frac{2N^{3}\sigma_{x}^{2}}{\sigma_{\eta 1}^{*6}}} \end{matrix} & (61) \end{matrix}$

and: $\begin{matrix} {\frac{1}{\sqrt{\det\left\lbrack {\nabla{\nabla{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta\quad 1}^{*}} \right)}}} \right\rbrack}}\begin{matrix} {= {\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad\sigma_{x}^{2}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{{2N}\quad}}}} \\ {\propto {\Delta\quad a_{1}{\Delta\mu}_{1}{\Delta\sigma}_{\eta 1}}} \end{matrix}} & (62) \end{matrix}$

Substitute (62) into (60): $\begin{matrix} \begin{matrix} {{p^{\prime}\left( \eta \middle| M_{1} \right)} \approx {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}} \\ {\frac{\left( {2\pi} \right)^{3/2}\sigma_{\eta 1}^{*3}}{\left( {2N^{3}} \right)^{1/2}\sigma_{x}}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 1}^{*N}}{\mathbb{e}}^{{- N}/2}} \\ {= {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad\sigma_{x}^{2}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{{2N}\quad}}}} \\ {\left( {2\pi} \right)^{3/2}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 1}^{*N}}{\mathbb{e}}^{{- N}/2}} \\ {= {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad\sigma_{x}^{2}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{{2N}\quad}}}} \\ {\left( {2\pi} \right)^{3/2}{p^{*}\left( {\left. \eta \middle| a_{1}^{*} \right.,\mu_{1}^{*},\sigma_{\eta 1}^{*},M_{1}} \right)}} \\ {\propto {\frac{\Delta\quad a_{1}}{a_{1_{\max}}}\frac{{\Delta\mu}_{1}}{\mu_{1_{\max}}}\frac{{\Delta\sigma}_{\eta 1}}{\sigma_{{\eta 1}_{\max}}}\left( {2\pi} \right)^{3/2}{p^{*}\left( {\left. \eta \middle| a_{1}^{*} \right.,\mu_{1}^{*},\sigma_{\eta 1}^{*},M_{1}} \right)}}} \end{matrix} & (63) \end{matrix}$

To compare with (46), first, we noticed that the 2^(nd) part in B in (46) is actually zero, i.e. B=σ_(η1)*², thus: $\begin{matrix} \begin{matrix} {{p\left( \eta \middle| M_{1} \right)} = {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}} \\ {\frac{1}{4\sqrt{2}\pi^{\frac{N}{2} - 1}N^{\frac{N - 1}{2}}}\frac{\sigma_{\eta 1}^{*3}}{\sigma_{x}\sigma_{\eta 1}^{*N}}{\Gamma\left( {\frac{N}{2} - \frac{3}{2}} \right)}} \end{matrix} & (64) \end{matrix}$

Use the Stirling series again, $\begin{matrix} \begin{matrix} {{\Gamma\left( {\frac{N}{2} - \frac{3}{2}} \right)} \approx {{{\mathbb{e}}^{{- \frac{N}{2}} + \frac{3}{2}}\left( {\frac{N}{2} - \frac{3}{2}} \right)}^{\frac{N}{2} - \frac{3}{2} - \frac{1}{2}}\sqrt{2\pi}}} \\ {\approx {{{\mathbb{e}}^{{- \frac{N}{2}} + \frac{3}{2}}\left( \frac{N}{2} \right)}^{\frac{N}{2} - 2}\sqrt{2\pi}}} \end{matrix} & (65) \end{matrix}$

this makes (64) looks like: $\begin{matrix} \begin{matrix} {{p\left( \eta \middle| M_{1} \right)} \approx {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}} \\ {\frac{1}{4\sqrt{2}\pi^{\frac{N}{2} - 1}N^{\frac{N - 1}{2}}}\frac{\sigma_{\eta 1}^{*3}}{\sigma_{x}\sigma_{\eta 1}^{*N}}{{\mathbb{e}}^{{- \frac{N}{2}} + \frac{3}{2}}\left( \frac{N}{2} \right)}^{\frac{N}{2} - 2}\sqrt{2\pi}} \\ {= {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad\sigma_{x}^{2}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{{2N}\quad}}}} \\ {\left( {2\pi} \right)^{3/2}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 1}^{*N}}{\mathbb{e}}^{{- N}/2}\frac{{\mathbb{e}}^{3/2}}{2}} \\ {\propto {\frac{\Delta\quad a_{1}}{a_{1_{\max}}}\frac{{\Delta\mu}_{1}}{\mu_{1_{\max}}}\frac{{\Delta\sigma}_{\eta 1}}{\sigma_{{\eta 1}_{\max}}}\left( {2\pi} \right)^{3/2}{p^{*}\left( {\left. \eta \middle| a_{1}^{*} \right.,\mu_{1}^{*},\sigma_{\eta 1}^{*},M_{1}} \right)}}} \\ {\frac{{\mathbb{e}}^{3/2}}{2}} \end{matrix} & (66) \end{matrix}$

This is just like (63) except for the factor $\frac{{\mathbb{e}}^{3/2}}{2},$ which likely arises from the Stirling series.

However, with both (38) and (63) calculated, (10) can be written quite neatly: $\begin{matrix} \begin{matrix} {\frac{p\left( H_{1} \middle| n \right)}{p\left( H_{0} \middle| n \right)} = \frac{p\left( {\left. M_{1} \middle| n \right.,t_{0}} \right)}{p\left( {\left. M_{0} \middle| n \right.,t_{0}} \right)}} \\ {= \frac{p\left( {\left. \eta \middle| M_{1} \right.,t_{0}} \right)}{p\left( {\left. \eta \middle| M_{0} \right.,t_{0}} \right)}} \\ {\approx \frac{\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{\left( {2\pi} \right)^{3/2}\sigma_{\eta 1}^{*3}}{\left( {2N^{3}} \right)^{1/2}\sigma_{x}}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 1}^{*N}}{\mathbb{e}}^{{- N}/2}}{\frac{1}{\mu_{0_{\max}}}\frac{1}{\sigma_{{\eta 0}_{\max}}}\frac{2{\pi\sigma}_{\eta 0}^{*2}}{\sqrt{2}N}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 0}^{*N}}{\mathbb{e}}^{{- N}/2}}} \\ {= {\frac{1}{a_{1_{\max}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\quad\sigma_{x}^{2}}}\sqrt{2\pi}\left( \frac{\sigma_{\eta 0}^{*}}{\sigma_{\eta 1}^{*}} \right)^{N - 2}}} \\ {\propto {\frac{\Delta\quad a_{1}}{a_{1_{\max}}}\sqrt{2\pi}\left( \frac{\sigma_{\eta 0}^{*2}}{\sigma_{\eta 1}^{*2}} \right)^{\frac{N - 2}{2}}}} \end{matrix} & (67) \end{matrix}$

It depends on the amplitude of the peak like (a₁*)^(N−2) when the window is on top of a peak, where σ_(η0)*²=σ_(s) ²˜a₁*²σ_(x) ²+σ_(η1)*², while σ_(η1)*² is the best estimation of noise variance. It is important to note that only three calculations must ever be done as the window moves: $\begin{matrix} {{\overset{\_}{s} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}s_{i}}}}{\sigma_{s}^{2} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}\left( {s_{i} - \overset{\_}{s}} \right)^{2}}}}{\overset{\_}{\left( {s - \overset{\_}{s}} \right)\left( {x - \overset{\_}{x}} \right)} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - \overset{\_}{s}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}}}}} & (68) \end{matrix}$

2.2 Preliminary Results for Finding Peaks in Poisson Noise

2.2.1 Comparing Models

The basic logic is similar to the case of white Gaussian noise. The noise is white because the essence of the Poisson process is that we are counting discrete events that are uncorrelated. The probability that we observe n events in the time interval [t,t+Δt] is given by the single step Poisson distribution: $\begin{matrix} {{p_{1}\left( n \middle| r \right)} = {\frac{r^{n}{\mathbb{e}}^{- r}}{n!}.}} & (69) \end{matrix}$ where r is the rate. We note that the expectation value of n is <n>=Σnp(n|r)=r. The rate will depend upon the local signal strength. If no signal is present (there is only dark current), then the rate will be denoted r₀. The signal rides on top of the dark current and it is assumed that the local count rate is directly proportional to the signal: r _(i) ≡r(t _(i))=r ₀ +ax(t _(i))  (70) We can try to estimate both r₀ and a from the same data, or we can estimate r₀ from data taken either from a separate time series, or a region of the time series that is far from any peak.

The likelihood function is then: $\begin{matrix} \begin{matrix} {{p\left( {\left. n \middle| a \right.,r_{0},M} \right)} = {\prod\limits_{i = 1}^{N}\quad\frac{r_{i}^{n_{i}}{\mathbb{e}}^{- r_{i}}}{n_{i}!}}} \\ {= {\prod\limits_{i = 1}^{N}\quad\frac{\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}{\mathbb{e}}^{- {({{ax}_{i} + r_{0}})}}}{n_{i}!}}} \\ {= {{\mathbb{e}}^{- {Nr}_{0}}{\mathbb{e}}^{- a}{\prod\limits_{i = 1}^{N}\quad\frac{\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}}{n_{i}!}}}} \end{matrix} & (71) \end{matrix}$ where it is assumed that x_(i) has unit area: ${\sum\limits_{i = 1}^{N}x_{1}} = 1.$

First, let's consider the case where there is no peak in the window, only the dark current due to electronic fluxion. We may use a prior distribution $\begin{matrix} {{p\left( {a,\left. r_{0} \middle| M_{0} \right.} \right)} = \frac{\delta(a)}{2r_{\max}}} & (72) \end{matrix}$

Marginalizing the likelihood function over (a, r₀) gives: $\begin{matrix} \begin{matrix} {{p\left( n \middle| M_{0} \right)} = {\int{\int{{p\left( {\left. n \middle| a \right.,r_{0},M_{0}} \right)}{p\left( {a,\left. r_{0} \middle| M_{0} \right.} \right)}{\mathbb{d}a}{\mathbb{d}r_{0}}}}}} \\ {= {\int{\int{{\mathbb{e}}^{- {Nr}_{0}}{\mathbb{e}}^{- a}{\prod\limits_{i = 1}^{N}\quad{\frac{\left( {{ax}_{i} + r_{0}} \right)^{ni}}{n_{i}!}\frac{\delta(a)}{2r_{\max}}{\mathbb{d}a}{\mathbb{d}r_{0}}}}}}}} \\ {= {\frac{1}{2r_{\max}{\prod\quad{n_{i}!}}}{\int_{0}^{r_{\max}}{{\mathbb{e}}^{- {Nr}_{0}}r_{0}^{\sum\limits_{i = t}^{N}n_{i}}\quad{\mathbb{d}r_{0}}}}}} \end{matrix} & (73) \end{matrix}$ Change variable: $\begin{matrix} {{{Nr}_{0} = t}{{dr}_{0} = \frac{dt}{N}}{r_{0}^{\sum n_{i}} = \left( \frac{t}{N} \right)^{\sum n_{i}}}} & (74) \end{matrix}$ then $\begin{matrix} \begin{matrix} {{p\left( n \middle| M_{0} \right)} = {\frac{1}{2r_{\max}{\prod\quad{n_{i}!}}}{\int{{{\mathbb{e}}^{- t}\left( \frac{t}{N} \right)}^{\sum\limits_{i = t}^{N}n_{i}}\frac{\mathbb{d}t}{N}}}}} \\ {= {\frac{1}{2r_{\max}N^{1 + {\sum\limits_{i = 1}^{N}n_{i}}}{\prod\quad{n_{i}!}}}{\int{{\mathbb{e}}^{- t}t^{\sum\limits_{i = 1}^{N}n_{i}}{\mathbb{d}t}}}}} \\ {= \frac{\left( {\sum\limits_{i = 1}^{N}n_{i}} \right)!}{2r_{\max}N^{1 + {\sum\limits_{i = 1}^{N}n_{i}}}{\prod\quad{n_{i}!}}}} \end{matrix} & (75) \end{matrix}$ Before moving on to compute the evidence assuming a peak is present in the window, we note we can do a parameter fitting on r₀: $\begin{matrix} {{\overset{\Cap}{r}}_{0} = \frac{\sum n_{i}}{N}} & (76) \end{matrix}$

This can now be used in the parameter fitting for the case there is a peak in the window.

We now consider how to detect a peak using a given peak shape. The model M₁ in this case is that, within the window, there is a peak in the presence of dark current: $\begin{matrix} \begin{matrix} {{p\left( n \middle| M_{1} \right)} = {\int{\int{{p\left( {\left. n \middle| a \right.,r_{0},M_{1}} \right)}{p\left( {a,\left. r_{0} \middle| M_{1} \right.} \right)}{\mathbb{d}a}{\mathbb{d}r_{0}}}}}} \\ {= {\int{\int{{\mathbb{e}}^{- {Nr}_{0}}{\mathbb{e}}^{- a}{\prod\limits_{i = 1}^{N}\quad{\frac{\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}}{n_{i}!}\frac{1}{r_{\max}a_{\max}}{\mathbb{d}a}{\mathbb{d}r_{0}}}}}}}} \\ {= {\frac{1}{r_{\max}a_{\max}{\prod\limits_{i = 1}^{N}\quad{n_{i}!}}}{\int{\int{{\mathbb{e}}^{- {Nr}_{0}}{\mathbb{e}}^{- a}\prod\limits_{i = 1}^{N}}}}}} \\ {\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}{\mathbb{d}a}{\mathbb{d}r_{0}}} \end{matrix} & (77) \end{matrix}$

The integral in (77) is over the parameter plane (a, r₀), the term $\prod\limits_{i = 1}^{N}\quad\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}$ is of problematic when the peak is comparable to dark current. However, as we are looking for strong evidences of peaks in the spectrum, and $\prod\limits_{i = 1}^{N}\quad\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}$ will be dominated by the peak even ax_(i) is just slightly larger than r₀ due the power of n_(i), and vice versa. Thus we can neglect a very narrow region close to the diagonal of ar₀-plane, thereby approximating (77) as: $\begin{matrix} \begin{matrix} {p\left( {{n\left. M_{1} \right)} \approx {p\left( {{n\left. M_{1}^{\prime} \right)} = {{p\text{(}n\left. {{only}\quad a\quad{peak}\quad{in}\quad{the}\quad{window}} \right)} +}} \right.}} \right.} \\ {p\text{(}n\left. {{only}\quad{dark}\quad{current}\quad{in}\quad{the}\quad{window}} \right)} \\ {= {{p\text{(}n\left. {{only}\quad a\quad{peak}\quad{in}\quad{the}\quad{window}} \right)} +}} \\ {p\left( {n\left. M_{0} \right)} \right.} \end{matrix} & (78) \end{matrix}$ where model M₁′ is an approximation of model M₁: there is either only dark current or only a peak in the window. Hence the ratio in (10) becomes: $\begin{matrix} {{\frac{p\left( {H_{1}\left. n \right)} \right.}{p\left( {H_{0}\left. n \right)} \right.} \approx \frac{p\left( {n\left. M_{1}^{\prime} \right)} \right.}{p\left( {n\left. M_{0} \right)} \right.}} = {1 + \frac{p\left( {n\left. {{only}\quad a\quad{peak}\quad{in}\quad{the}\quad{window}} \right)} \right.}{p\left( {n\left. M_{0} \right)} \right.}}} & (79) \end{matrix}$ where p(n|only a peak in the window) can be computed with the same fashion as p(n|M₀), where only dark current is assumed in the window, but with a different prior: $\begin{matrix} {{p\text{(}a},{{r_{0}\left. {{only}\quad{peak}} \right)} = \frac{\delta\left( r_{0} \right)}{2a_{\max}}}} & (80) \end{matrix}$ This results in: $\begin{matrix} {{p\text{(}n\left. {{only}\quad a\quad{peak}\quad{in}\quad{the}\quad{window}} \right)} = \frac{\left( {\prod x_{i}^{n_{i}}} \right){\left( {\sum n_{i}} \right)!}}{2{a_{\max}\left( {\prod{n_{i}!}} \right)}}} & (81) \end{matrix}$

We are then ready to evaluate the ratio in (79).

2.2.2 Parameter Fitting

With above ratio (79) computed, if model M₁′ is more preferable, the parameter, i.e. the amplitude, can be estimated by maximizing the likelihood function (71), using the dark current estimated from the tail of the spectrum where there is no peak. The natural logarithm of likelihood function (71) is: $\begin{matrix} \begin{matrix} {L = {\log\quad\left\lbrack {p\left( {n\left. {a,{\overset{\Cap}{r}}_{0},M_{1}} \right)} \right\rbrack} \right.}} \\ {= {{- {Nr}_{0}} - a + {\sum{n_{i}{\log\left( {{\overset{\Cap}{r}}_{0} + {ax}_{i}} \right)}}} - {\sum{n_{i}\log\quad\left( {n_{i}!} \right)}}}} \end{matrix} & (82) \end{matrix}$ To maximize it: $\begin{matrix} {\frac{\partial L}{\partial a} = {{{- 1} + {\sum{n_{i}\frac{x_{i}}{{\overset{\Cap}{r}}_{0} + {ax}_{i}}}}} = 0}} & (83) \end{matrix}$ Note ax_(i)>>{circumflex over (r)}₀ if there is a peak, the {circumflex over (r)}₀ in the denominator is then negligible, which results in: $\begin{matrix} {a_{0} \approx {\sum\limits_{k = 1}^{N}n_{k}}} & (84) \end{matrix}$

This is our best estimate of the amplitude of the peak. It should be noted that this approximation is not good for very small peaks.

3. Derivation of the Line Shape for TOF-SIMS Mass Peaks

The TOF-SIMS is a counting experiment and it (typically) counts the arrival times of one ion at a time. For any given ionization pulse, there will be only a few ions of a given mass that reach the detector. Hence, the eventual pulse shape will be the accumulation of many independent experiments, with the origin of time t=0 for each experiment being the arrival time of the beam pulse that initiates the ion formation. We consider a single ion of mass M that emerges from a typical beam pulse. After passing through what could be very complicated ion optics that performs charge/mass segregation, at a time t=t₁ (that is mass-dependent) this ion enters free flight with a velocity v₀, which is also mass-dependent. We assume the ion enters free flight at z=0 and travels a distance L to the detector along a linear trajectory z(t)=v₀(t−t₁). We then ask what the PDF of arrival times for this ion would be if the PDF in the particle phase space at time t₁ is f(z,v*,t=t ₁)=f ₀(z,v)=δ(z)g(v).  (85) Here, f(z,v*,t)dzdv is the probability at time t that the particle lies in an infinitesimal neighborhood of the point (z, v) in phase space and we allow for an uncertainty in the ions initial velocity, but g(v) is assumed to be sharply peaked around v₀. For f to be a PDF we must have ∫dvg(v)=1. This PDF evolves according to the Fokker-Planck equation, which in the present situation is simply $\begin{matrix} {{{\frac{\partial f}{\partial t} + {v\frac{\partial f}{\partial z}}} = 0},} & (86) \end{matrix}$ implying that f(z,v*,t)=f ₀(z−vt,v)=δ(z−v(t−t ₁))g(v).  (87) The PDF of arrival times at the detector placed at z=L is given by the flux of probability crossing this point, which is $\begin{matrix} {{{P(t)}{\mathbb{d}t}} = {{\int{{\mathbb{d}{v\left( {v{\mathbb{d}t}} \right)}}{f\left( {{z = L},{v;t}} \right)}}} = {\frac{L}{\left( {t - t_{1}} \right)^{2}}{g\left( \frac{L}{t - t_{1}} \right)}{{\mathbb{d}t}.}}}} & (88) \end{matrix}$ We suggest the use of a Gaussian for the velocity distribution g(v) from maximum entropy arguments: when assigning a PDF for the outcome of what will be the single case of a repeated series of identical and independent experiments, one should choose that PDF which maximizes the number of possible outcomes that are consistent with the probability assignment. Thus, the choice of a Gaussian here is based on the desire to minimize bias in the outcome, not on physical arguments requiring ‘thermalization’ in the ionization process or the nature of the ion optics. To summarize, our proposed peak (wavelet) shape is $\begin{matrix} {{x\left( {t\text{;}M} \right)} = {\frac{L}{\left( {t - {t_{1}(M)}} \right)^{2}}{{\exp\left\lbrack {- \frac{\left( {\frac{L}{t - {t_{1}(M)}} - {v_{0}(M)}} \right)^{2}}{2{\sigma^{2}(M)}}} \right\rbrack}.}}} & (89) \end{matrix}$

4. Results on TOF-SIMS

TOF-SIMS is an abbreviation for Time-of-Flight Secondary Ionization Mass Spectrometer. It uses a high energy primary ion beam to probe the sample surface. Poisson noise is associated with this case.

Here we illustrate our results for finding peaks in a spectrum. FIG. 2 shows the work flow from a TOF-SIMS mass spectrum (silver foil, gallium beam). In FIG. 2 a, a segment of data containing only one peak from a spectrum is plotted. A window is then running through these data point by point, the ratio of two hypotheses, equation (79) is computed for each window, resulting in the curve shown in FIG. 2 b. Meanwhile, the maximum likelihood estimation of amplitude and the maximized likelihood are also computed. The maximized likelihood is plotted as the curve in FIG. 2 c. Then, a threshold of 20 is set for the ratio, the points that are above this threshold are marked as dots in FIG. 2 b. Points at corresponding positions are also marked in FIG. 2 c. Of those dots in the likelihood plot, only a fraction (diamonds) are selected to fit locally to a parabola, from which the position of the peak is determined. Then the amplitude of the peak is chosen from the estimated amplitude according the peak position. It is overlapped with the peak in FIG. 2 d, but is scaled so that it can be fitted in to the panel.

FIG. 3 depicts a segment of a TOF-SIMS spectrum of angiotensin (silver foil, gallium beam), with the y-axis corresponding to counts and the x-axis corresponding to time points. The data segments shows four continuous peaks (corresponding to the raw data) arising from isotopes, along with the peaks identified after applying the methods of the invention.

The peak identification method described herein is potentially useful in any data set wherein peaks of interest are distinguished. The method described herein is useful for identifying peaks in data sets obtained from spectroscopy, including but not limited to various forms of absorption and emission spectroscopy, including absorption spectroscopy, UV/VIS spectroscopy, IR spectroscopy, X-ray spectroscopy, X-ray crystallography, Nuclear Magnetic Resonance spectroscopy (NMR), and raman spectroscopy. In particular, the methods described herein are useful for applications in mass spectroscopy, including TOF-SIMS and SELDI-MS. For example, the methods described herein are useful for analyzing TOF-SIMS mass spectroscopic data related to proteomics. The methods described herein are not limited to spectroscopic applications; for example, the methods are useful when used for imaging applications. 

1. An automated method of data processing and evaluation comprising the steps of: (a) selecting a plurality of peak shapes within a data set; (b) developing a hypothesis which predicts a specified number of peaks present within a given window; (c) testing the hypothesis that the specified number of peaks is present within the given window; and (d) maximizing a likelihood function to estimate the position and amplitude of said peaks present within the given window.
 2. The method of claim 1, wherein said maximizing of said likelihood function is performed only within one or more windows identified as having one or more peaks present.
 3. The method of claim 1, wherein said data set is obtained from a mass spectra.
 4. The method of claim 3, wherein said data set is obtained from time-of-flight mass spectra.
 5. The method of claim 3, wherein said peak shapes are derived from an evolution of probability density for a single ion.
 6. The method of claim 5, wherein said evolution of probability density maps a probability density function from an input to a flight tube into a probability flux striking a detector.
 7. The method of claim 3, wherein said data set is derived from one or more complex biological mixtures.
 8. The method of claim 1, wherein said data set is obtained from analysis of an image.
 9. A software program for data processing and evaluation comprising: (a) a first program code for selecting a plurality of peak shapes within a data set; (b) a second program code for developing a hypothesis which predicts a specified number of peaks present within a given window; (c) a third program code for testing the hypothesis that the specified number of peaks is present within the given window; and (d) a fourth program code for maximizing a likelihood function to estimate the position and amplitude of said peaks present within the given window. 